Notebook prepared by Clare Gibson.
In this notebook I will:
In this notebook I use the following tools:
here for file locationtidyverse for data wranglingggplot2 for plotting# Load packages
library(here)
library(tidyverse)
library(ggplot2)
In this notebook I use the Kaggle Perth House Prices dataset. The chunk below calls a script that loads the data into R and prepares it for the linear regression exercise.
# Load data
source(here("R/utils.R"))
# Store data in variable df
df <- model_1
# Show the head
head(df)
## # A tibble: 6 × 4
## price size bedrooms age
## <dbl> <dbl> <dbl> <dbl>
## 1 565 160 4 15
## 2 365 139 3 6
## 3 287 86 3 36
## 4 255 59 2 65
## 5 325 131 4 18
## 6 409 118 4 22
In this exercise I am working on predicting house prices using various features. The training dataset contains 29938 examples with three features (size, bedrooms and age). I will build a linear regression model using these values so that I can predict the price for other houses.
The code chunk below creates the x_train and
y_train variables.
# Create x_train and y_train
x_train <- as.matrix(df[,-1])
y_train <- as.matrix(df[,1])
The example data is now stored in matrix x_train. Each
row of the matrix represents one training example. When you have \(m\) training examples and \(n\) features, x_train is a
matrix with dimensions \((m,n)\). In
this example the matrix has dimensions 29938, 3.
dim(x_train)
## [1] 29938 3